Workshop, Assessment, and Validity Evidence for Tools Measuring Performance of Knee and Shoulder Arthrocentesis

Introduction Musculoskeletal concerns are common, yet residents at our institution lacked arthrocentesis training. We created a workshop to teach residents knee and shoulder arthrocentesis, developed simulated assessment scenarios (SASs) with tools to measure procedural proficiency, and collected validity evidence. Methods A multidisciplinary group conducted a modified Delphi to define content for the workshop, SASs, and assessment tools. We defined minimum thresholds for competence in knee and shoulder arthrocentesis using the modified borderline-group method. We implemented the workshop and SASs in 2020 and 2021 and analyzed assessment tool scoring for statistical reliability and validity. Our program evaluation included SAS performance, participants’ survey responses, and change in the number of arthrocenteses performed in the internal medicine (IM) resident primary care clinic. Results Sixty-one residents (53 IM, eight physical medicine and rehabilitation [PM&R]) participated. Fifty-two (85%; 46 IM, six PM&R) completed the evaluation survey. We procured data from 48 knee and 65 shoulder SASs for validity evidence. All arthrocentesis SAS performances met the proficiency standard except one resident's shoulder SAS. Validity evidence revealed strong interrater reliability (α = .82 and .77 for knee and shoulder, respectively) and strong relational validity (p < .001 for both procedures). All participants rated workshop quality and usefulness as good or very good. The number of arthrocenteses performed at our institution's primary care clinic increased. Discussion We created a workshop to teach residents arthrocentesis and assessment tools with strong validity and reliability evidence. The workshop was well regarded by residents, who applied their arthrocentesis skills during patient care.


Introduction
Musculoskeletal complaints commonly motivate patients to seek medical care. 1 Arthrocentesis is a cornerstone of evaluating and treating these problems, with the knee and shoulder being the most injected joints. 2 As such, the American College of Physicians (ACP) lists arthrocentesis as an important skill for internists to master. 3 While the clinical objectives of arthrocentesis and intra-articular injections differ, the techniques overlap. In this publication, the term arthrocentesis represents both procedures in accordance with ACP nomenclature.
Familiarity with knee and shoulder arthrocentesis is declining, 4 and national claims data suggest that primary care providers are performing fewer arthrocenteses. 5 Locally, internal medicine (IM) residents at Washington University in St. Louis School of Medicine (WUSM) identified gaps in arthrocentesis training during interviews at the end of the 2018-2019 academic year, and only 10 (7%) IM residents performed as few as 15 arthrocenteses during the 2019-2020 academic year. Reduced physician training, comfort, and performance of these procedures may delay patients' time to diagnosis and treatment.
Studies demonstrate that using models to teach arthrocentesis leads to better skill development than lecture or patient encounters alone. 5 The lack of well-developed measures for procedural competence has hindered arthrocentesis training initiatives. Most arthrocentesis skills assessments have relied on trainees' self-reported confidence levels without data demonstrating that they improve procedure quality. [6][7][8] The few studies that do focus on assessing procedural competence are unpublished, fail to gather validity evidence, or have produced poor-to-moderate reliability measures. 5,[9][10][11][12][13] Behavioral learning theory states that skills are learned through practice and feedback. 14 Following this theory, we created an educational initiative for residents to learn the skills of knee and shoulder arthrocentesis through repetition and expert guidance, as well as assessment tools with strong validity evidence to measure procedural proficiency in knee and shoulder arthrocentesis.

Content Identification
We assembled two teams of internists, physiatrists, and rheumatologists skilled in knee (Amber Deptola, Colin Diffie, Peter McDonnell, John P. Metzler, Nathan P. Olafsen, and Lisa Zickuhr) and shoulder (Dominique Cosco, Colin Diffie, Maria Gonzalez-Mayda, John P. Metzler, Nathan P. Olafsen, and Lisa Zickuhr) arthrocentesis. Each team member consulted the literature and their individual expertise to generate a list of items essential for teaching and assessing knee and shoulder arthrocentesis. 2 We gathered items and reached a consensus on which to include in educational materials through a modified Delphi method. 15 We set our consensus level as an agreement among four of six survey participants (67%). The modified Delphi results (Tables 1-3) served as a content guide for curricular and assessment materials.

Assessment Development
We used the results of the modified Delphi surveys to create assessment tools for knee and shoulder arthrocentesis comprising checklist items and a global rating score ranging from 1 to 9 (Appendices A and B). The checklist outlined critical steps in preprocedural, procedural, and postprocedural arthrocentesis care. Members of our multidisciplinary team (Amber Deptola, Colin Diffie, Peter McDonnell, John P. Metzler, Nathan P. Olafsen, Andrea Ramirez-Gomez, and Lisa Zickuhr) assigned points to checklist items through deductive conversation, drawing upon collective experience in knee and shoulder arthrocentesis. We classified each checklist item as major or minor, where major items were considered essential to the performance of knee and shoulder arthrocentesis. Minor items were thought to be important steps that might not affect the outcome of the procedure if performed incorrectly. Items related to individualizing informed consent, positioning the patient, using anatomic landmarks to identify the injection site, using sterile technique, and needle driving technique were classified as major. They were weighted twice as heavily as the minor items. This strategy assigned a cumulative point value of major items greater than the total of minor items, quantifying the essential nature of the major items within total checklist scores. We then selected items for which trainees could earn partial credit, deciding that dichotomous tasks such as maintaining sterility, managing sharps, and caring for the injection site did not warrant partial credit because learners would either complete or miss them.
The global rating score assessed the overall performance of the knee or shoulder arthrocentesis, incorporating items not referenced on the checklist. We wrote behavioral anchors to describe some components influencing global performance, such as communication skills, procedural flow, and knowledge of the procedure, providing a general depiction of unsatisfactory, satisfactory, and exceptional performances. Final assessment scores represented the sum of checklist items and the global rating score.
We wrote two simulation assessment scenarios (SASs) for knee and shoulder arthrocentesis that reflected scenes commonly encountered during patient care (Appendices C and D). The SASs prompted learners to perform skills identified in our modified Delphi. They emphasized different aspects of arthrocentesis (e.g., the effects of comorbid conditions such as diabetes mellitus on arthrocentesis informed consent) to assess transferability to residents' knowledge and skills.
Members of our multidisciplinary team (Colin Diffie, Jacob Murray, Nathan P. Olafsen, and Lisa Zickuhr) described expectations for competence in knee and shoulder arthrocentesis. We used the modified borderline-group method to define proficiency as cutoff scores on our assessment tools. 16 Lisa Zickuhr wrote hypothetical SAS performance scenarios that the team reviewed in conjunction with SAS performance data collected during the workshops. Hypothetical SAS scores ensured the representation of a variety of score distributions in the analysis. The median score for the borderline groups determined the cutoff score for the knee and shoulder arthrocentesis assessment tools.   Initiates a timeout 6 6 Included Uses lidocaine along projected trajectory 6 2 Excluded Posterior glenohumeral approach-1-2 cm below the posterior acromion and into the glenohumeral groove, towards the acromion process The protocols for this study were reviewed and approved by the WUSM Institutional Review Board (reference number 202004135, May 14, 2020) and classified as exempt.
We created three instructional videos on preprocedural, procedural, and postprocedural arthrocentesis care (Appendices E-G), which trainees watched before the workshop. The first video reviewed the indications, contraindications, risks, alternatives, equipment, and postprocedural guidance of knee and shoulder arthrocentesis. The other two videos demonstrated the steps of performing knee and shoulder arthrocentesis, focusing on the anterior, medial, and lateral approaches of knee arthrocentesis and the posterior glenohumeral, lateral subacromial, and posterolateral subacromial shoulder arthrocentesis.
The in-person workshop lasted 75 minutes and included three stations (Appendix H). Trainees practiced performing arthrocentesis in small groups with no greater than a 2:1 trainee-to-facilitator ratio. We created visual aids (Appendices I-L), provided anatomic models, and used knee and shoulder arthrocentesis simulation models (Sawbones) during the workshops.
After residents had practiced the arthrocentesis techniques, we assessed their knee and shoulder arthrocentesis skills in two SASs. During the SASs, facilitators conveyed instructions to the trainees and assumed the identities of the simulated patients. Trainees interacted with the facilitator as they would a standardized patient for any verbal components of the SASs and demonstrated procedural steps on the simulation model.

Collection of Validity Evidence for Assessment Tools
Cook and Hatala's description of Messick's validity framework structured our approach to gathering validity evidence for our assessment tools. 17 First, we created four videos (Appendices M-P) that represented staged knee and shoulder arthrocentesis performances, including common errors and successes observed during a workshop pilot in 2020. Our multidisciplinary team implemented frame-of-reference training, using the standard for minimal proficiency described below (in the Results section entitled Defining Proficiency) and practicing scoring SASs depicted in the videos. 18 Second, facilitators scored participants' SAS performances using the developed assessment tools. Sometimes, two facilitators graded one trainee's performance simultaneously without referencing each other's scores. Additionally, facilitators scored the residents' performance of knee and shoulder arthrocentesis against a modified version of the Global Rating Index for Technical Skills (GRITS), a global rating scale measuring proficiency in procedural skills in the surgical setting. 19 We omitted components specific to surgery, a process used previously for collecting validity evidence for procedural assessment tools. 20 We used these data to analyze interrater reliability and relational validity.

Workshop Evaluation
To evaluate the outcomes of our workshop, we first determined how many participants met the minimum threshold for competence. Second, we measured the change in residents' selfperceived ability to perform knee and shoulder arthrocentesis through retrospective pre-post Likert scales (Appendix Q). 19,20 Likert-type questions probed for the self-perceived ability to list the indications, contraindications, risks, and benefits of knee and shoulder arthrocentesis; prepare to perform knee and shoulder arthrocentesis; advance the arthrocentesis needle; and provide postprocedural care instructions. We presented learners with QR codes for accessing evaluation surveys at the end of the workshop and provided time for survey completion to enhance the response rate. Lastly, to estimate the workshop's potential effect on clinical practice, we calculated the number of arthrocentesis procedures performed in the resident primary care clinic and how many different residents performed them before and after the workshop.

Statistical Methods
We calculated Krippendorff's alpha to measure the interrater reliability of the total scores on the knee and shoulder arthrocentesis assessment tools. We set the standard of Krippendorff's alpha greater than .70 a priori, in keeping with statistical standards. 21 Power calculations indicated that we needed at least 17 knee and 17 shoulder SASs scored by two facilitators to reach this standard. 22 All observations were used to compute the Pearson correlation between our novel assessment tools' total scores and the modified GRITS scores. When multiple facilitators assessed the same resident, the mean of the scores was used.
We used the Wilcoxon signed rank test to compare retrospective pre-post Likert-scale data obtained from the postworkshop evaluation survey. For measures explicitly related to knee arthrocentesis skills, participants were compared based on whether they had previously completed a knee arthrocentesis procedure. For measures explicitly related to shoulder arthrocentesis skills, participants were compared based on whether they had previously completed a shoulder arthrocentesis procedure. For more general measures applying to knee and shoulder arthrocentesis, participants were separated by whether they had previously completed either a knee or shoulder arthrocentesis procedure. We used chi-square analysis to compare the number of arthrocenteses completed in the IM residents' primary care clinic pre-and postworkshop. All analyses were conducted in R (version 4.1.0; R Core Team).

Defining Proficiency
We defined proficiency in knee and shoulder arthrocentesis as the ability to perform major and minor checklist items without errors that would compromise the integrity of the procedure or increase the patient's risk of complications. Proficient performance did not require accessing the joint or bursal space because sometimes imaging guidance is needed during patient care to accomplish this task and because the decision to appropriately abort a procedure is a form of competence. Competent learners would demonstrate a general awareness of the procedure's steps. They could include a few noncritical mistakes or major mistakes that were corrected during the procedure, as well as mild-to-moderate procedural hesitancy. Proficient learners would maintain a reasonable line of communication with the patient throughout the knee or shoulder arthrocentesis. The median total assessment score (checklist total + global rating score) of borderline SAS performances set the cutoff score for proficiency as 21 of 39 points (range: 16-25 points) for knee arthrocentesis and 23 of 37 points (range: 19-26 points) for shoulder arthrocentesis.

Workshop Evaluation
In total, 113 residents (52 IM residents in 2020, 53 IM residents in 2021, eight PM&R residents in 2021) participated in the workshop. We collected data from 48 knee and 65 shoulder SASs as validity evidence during the 2020 and 2021 workshops. The number of SASs used for statistical analysis represents the subset of workshop participants who were assessed by members of the research team. All documented knee arthrocentesis SAS performances met the minimum score for competence, achieving at least 21 points (median total score: 35 points, range: 26-39 points) on the knee arthrocentesis assessment tool. All but one resident met the minimum score for competence on the shoulder arthrocentesis SAS (median score: 31 points, range: 20-38 points), with one resident earning a total assessment score of 20 points. All others achieved greater than 23 points.
We invited 61 residents to complete the evaluation survey after workshops held in 2021; 52 (85%) participated, 46 from IM and six from PM&R. Analysis of survey responses showed that residents without prior arthrocentesis experience reported a significant increase in their ability to obtain informed consent, conduct ancillary steps in the procedure, and provide patients with instructions for postarthrocentesis care ( p < .001 in all three categories; Table 4). The ancillary steps defined on the survey included listing the needed materials, performing a timeout, positioning the patient, sterilizing the site, and maintaining a no-touch technique. Regardless of prior arthrocentesis experience, all participants reported increased skill levels in identifying knee arthrocentesis anatomic landmarks (previous experience p = .01, no experience p < .001) and needle driving technique (prior experience p = .01, no experience p < .001). In contrast, only those without previous experience reported enhanced skill levels in identifying shoulder arthrocentesis anatomic landmarks and needle driving technique ( p < .001 for both; Table 4). In addition, all participants indicated they would be more likely to offer knee and shoulder arthrocentesis to their patients after the workshop (prior experience p = .002, no experience p < .001).

Discussion
We present educational materials and assessment tools for teaching IM and PM&R residents the procedures of knee and shoulder arthrocentesis. Using a multidisciplinary approach, we defined the knowledge and skills needed to perform these procedures and incorporated them into hands-on workshops grounded in behavioral learning theory. These workshops enhanced residents' procedural skills and patient care practice regardless of their previous experience performing arthrocentesis. Furthermore, we collected strong evidence to support using our assessment tools when measuring knee and shoulder arthrocentesis proficiency during simulated scenarios. The design of our flipped classroom workshop provides flexibility for learning and efficiency for teaching. We created three videos for residents to review before a 75-minute hands-on session. The use of prework increased learner independence, as residents could watch the videos at times, places, and speeds most convenient for them. This approach also maximized the use of limited teaching resources, namely, the time experienced clinicians volunteered to supervise residents' hands-on practice.
Other published curricula require from 90 minutes to several days of dedicated in-class instruction. 4,8,12,13,23,24 For example, Bretagne and colleagues used simulation models, cadavers, and 3 half-days of in-person instruction to improve learners' confidence and theoretical knowledge of arthrocentesis. 13 In comparison, our flipped classroom workshop provided learners with an effective educational experience in 75 minutes. Additionally, the materials can be presented en bloc-first showing the videos to a group of learners before leading them in the guided practice of knee and shoulder arthrocentesis-by educators who prefer a different workshop structure.
The content of our materials is generalizable to learners across specialties. We formed a multidisciplinary group of educators from IM, physiatry, and rheumatology and completed a modified Delphi to capture the group's viewpoints in a standardized way. Furthermore, we studied our materials within two residency programs, IM and PM&R. These methods enhanced the generalizability, reliability, and validity of the materials we created.
Our workshop uniquely incorporates assessment strategies and tools in its materials, whereas previous publications provide either instructional materials or assessment tools. 4,5,[7][8][9][11][12][13]23,24 Our strategy reinforces the methods of curricular design for medical education. 25 In addition, our assessment tools generated strong evidence for reliability and validity, whereas previously published arthrocentesis assessment tools have considered only content validity. 4,9,[11][12][13]23 Herein, we provide educators with materials to train their team of assessors to achieve similarly valid and reliable results. 13 Our methods limit the application of our curricular and assessment materials. First, the modified Delphi method remains subject to error. The results may have changed if other experts had participated and submitted different skills, such as learning how to exchange syringes during the procedure. Furthermore, we selected an agreement threshold of four of the six participants (67%) because it was similar to the standard of 70%. The typical standard of 70% would necessitate consensus among seven of the eight (88%) participants, a level of agreement higher than is commonly utilized in Delphi methodology. 15 Second, to define proficiency in knee and shoulder arthrocentesis, we used the modified borderline-group method, which has been shown to be less precise than the borderline regression method. 26 Reduced precision may be evident in the different cutoff scores for knee and shoulder arthrocentesis. Third, the simulated environment limits learners' exposure to anatomic variations and interferes with applying our assessment tools to workplace-based observations. Fourth, IM and PM&R residents from a single institution participated in the workshop, restricting generalizability to other disciplines and locations.
Future initiatives should assess the performance of the arthrocentesis assessment tools in the patient care environment. This could provide validity evidence for their use in measuring procedural proficiency during workplace-based assessment and create opportunities to correlate SAS performance with procedural skills in patient care.
In summary, we offer educational materials and assessment tools for teaching IM and PM&R residents how to perform knee and shoulder arthrocentesis. Our work publishes the first assessment tools with strong validity evidence for these procedures. In addition, it demonstrates how these educational initiatives rejuvenate the provision of arthrocentesis in the clinical environment, enhancing the care of patients with musculoskeletal conditions.